A Large Scale Clustering Scheme for Kernel K-Means
نویسندگان
چکیده
Kernel functions can be viewed as a non-linear transformation that increases the separability of the input data by mapping them to a new high dimensional space. The incorporation of kernel function enables the K-Means algorithm to explore the inherent data pattern in the new space. However, the recent applications of kernel KMeans algorithm are confined to small corpora due to its expensive computation and storage cost. To overcome these obstacles, we propose a new clustering scheme which changes the clustering order from the sequence of samples to the sequence of kernels, and employs a diskbased strategy to control data. The new clustering scheme has been demonstrated to be very efficient for large corpus by our experiments on hand-written digits recognition, in which more than 90% of the running time was saved.
منابع مشابه
Efficient Approximation for Large-Scale Kernel Clustering Analysis
Kernel k-means is useful for performing clustering on nonlinearly separable data. The kernel k-means is hard to scale to large data due to the quadratic complexity. In this paper, we propose an approach which utilizes the low-dimensional feature approximation of the Gaussian kernel function to capitalize a fast linear k-means solver to perform the nonlinear kernel k-means. This approach takes a...
متن کاملEuler Clustering
By always mapping data from lower dimensional space into higher or even infinite dimensional space, kernel k-means is able to organize data into groups when data of different clusters are not linearly separable. However, kernel k-means incurs the large scale computation due to the representation theorem, i.e. keeping an extremely large kernel matrix in memory when using popular Gaussian and spa...
متن کاملScalable Kernel Clustering: Approximate Kernel k-means
Kernel-based clustering algorithms have the ability to capture the non-linear structure in real world data. Among various kernel-based clustering algorithms, kernel k -means has gained popularity due to its simple iterative nature and ease of implementation. However, its run-time complexity and memory footprint increase quadratically in terms of the size of the data set, and hence, large data s...
متن کاملDistributed Kernel K-Means for Large Scale Clustering
Clustering samples according to an effective metric and/or vector space representation is a challenging unsupervised learning task with a wide spectrum of applications. Among several clustering algorithms, k-means and its kernelized version have still a wide audience because of their conceptual simplicity and efficacy. However, the systematic application of the kernelized version of k-means is ...
متن کاملApproximate Large-scale Multiple Kernel k-means Using Deep Neural Network
Multiple kernel clustering (MKC) algorithms have been extensively studied and applied to various applications. Although they demonstrate great success in both the theoretical aspects and applications, existing MKC algorithms cannot be applied to large-scale clustering tasks due to: i) the heavy computational cost to calculate the base kernels; and ii) insufficient memory to load the kernel matr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002